’ Introduction RELIABILITY - AWARE MICROARCHITECTURE
نویسندگان
چکیده
It is becoming increasingly difficult to achieve expected levels of reliability and data correctness as the industry approaches the era of extreme CMOS scaling. Aging-related device degradation is becoming a real threat to lifetime reliability. Many processors already include solutions for soft errors in memory structures. More recently, soft errors in logic paths have become an increasing concern. Process variability effects are challenging conventional design as the law of large numbers is no longer useful for describing device behavior, and design based on worstcase paths becomes impractical. Impending problems with burn-in and timing effects from temperature are additional threats to reliable operation. In the past, architects have often left reliability concerns to lower levels of the system stack. As the severity of these problems increases, however, low-level solutions will likely not suffice, and straightforward systemlevel solutions such as blind redundancy will likely be too expensive for most market segments. Architects will therefore need to make reliability a first-class design constraint and develop new cost-effective approaches to reliability-aware design. Compared to device and circuit-level solutions, architecture-level solutions can more easily exploit application-specific behavior. For example, for common applications, a large fraction of raw soft errors are masked at the architecture level, potentially allowing for lower-cost solutions at this level. As another example, architecture-level solutions offer the opportunity for application-driven dynamic lifetime reliability management, allowing an optimal distribution of failure rates in time and space. Sarita V. Adve
منابع مشابه
A Survey of Lifetime Reliability-Aware System-Level Design Techniques for Embedded Multiprocessor Systems
Lifetime reliability is emerging as a major concern for system design as escalating power density and hence temperature variation continues to accelerate wear-out, leading to a growing prominence of device-defects. This has attracted a significant attention both in industry and in academia to investigate on wear-out mitigation techniques, from micro-architectural adaptations to systemlevel opti...
متن کاملReliability Aware Exceptions for Software Directed Fault Handling
Today reliability emerges as a first order design constraint. Faults encountered in a chip can be classified into three categories: transient, intermittent and permanent. Fault classification allows a chip designer to provide the appropriate corrective action for each fault type. However, fault classification and correction are expensive mechanisms to implement in hardware. In spite of their cr...
متن کاملReliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)
Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...
متن کاملRAMP: A Model for Reliability Aware MicroProcessor Design
This report introduces RAMP, an architectural model for long-term processor reliability measurement. With aggresive transistor scaling and increasing processor power and temperature, reliability due to wear-out mechanisms is expected to become a significant issue in microprocessor design. Reliability awareness at the microarchitectural design stage will soon be a neccessity and RAMP provides a ...
متن کاملPower Aware Techniques: Extensions to ISAs
Mobile computing is heavily dependent on battery life. Although circuit designs already take advantage of microelectronics and microarchitecture-level optimization techniques, device longevity can be further extended through energy aware compilation techniques. This communication gives an overview of software-based power aware techniques, namely re-starters to make temporary processor state vis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005